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nucleotide sequences coding for a thermostable dna 
polymerase, dna polymerase and uses thereof 

Specification 

The present invention concerns the isolation 
5 and the identification of sequences coding a DNA 
polymerase from bacteria belonging to the Archaea 
domain (Woese C.R. et al. 1990, Proc. Natl. 
Acad.Sci. USA 87, 4576-4579), to the protein coded 
by said sequence and to uses thereof . 

10 DNA polymerases are enzymes responsible of 

the duplication of genomic DNA and, therefore, of 
the inheritance of the genetic material. Sequences 
coding DNA polymerase from bacteria belonging to 
the Archaea domain are not known in the prior art, 

15 Such bacteria are adapted to grow at high 
temperatures, and are evolutionary far from 
Eubacteria. 

DNA polymerases may be classified in two 
classes (Ito, J., and Braithwaite, D.K. (1991) 

20 Nucleic Acids Res. 19, 4045-4057). Class A 
comprises dideoxynucleotide inhibition sensitive 
and aphidicolin resistant enzymes, as pol I from E. 
coll (Joyce, CM., Kelley, W.S., and Grindley, N.D. 
F. (1982) J. Biol. Chem. 257, 1958-1964); class B 

25 is more heterogeneous, comprising aphidicolin 
sensitive and partially dideoxynucleotide 
inhibition resistant enzymes. 

The authors of the instant invention have 
demonstrated that DNA polymerase extracted from 

30 bacteria of thermostable and thermofilic Sulfolobus 
solfataricus species has a molecular weight of 
around 100 kDa, by means of gel filtration 
chromatoghraphy and of glycerol gradient 
centrifugation. An electrophoresis in denaturing 
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conditions on polyacrylammide gel shows, other than 
the 100 kDa protein, two major bands, respectively 
of 50 e 40 kDa. These bands represent proteolytic 
cleavage fragments of the 100 kDa protein, being 
5 able to react with antisera raised against the 
native 100 kDa protein. Moreover the 50 kDa 
fragment keeps a DNA polymerase activity (Karawya, 
r. E. , Swack, J. A., and Wilson, S.H. (1983) Anal. 
Biochem. 135, 318-325) . 

10 The authors of the present invention have 

isolated and sequenced the gene coding the DNA 
polymerase from S. solfataricus, and have deduced 
the aminoacid sequence of the protein. Upon 
insertion into procaryotic or eucaryotic expression 

15 vectors and transformation of suitable hosts, the 
gene makes possible the production through 
recombinant DNA techniques of the DNA polymerase 
enzyme . 

According to the invention the term 
20 "thermof ilic" refers to enzymes with a peak of 
activity at temperatures comprised between 50 °C and 
85°C / preferably 75°C, when a substrate of DNA from 
activated calf thymus is used; the term 
"thermostable" refers to the fact that the enzyme 
25 keeps 100% of activity after incubation for 40 min 
at 75°C. 

It is an object of the invention a nucleic 
acid of natural, recombinant or synthetic origin, 
comprising a nucleotide sequence coding a 

30 polypeptide or fragments thereof having a 
thermostable and thermofilic DNA polymerase 
activity. Preferably said nucleotide sequence is } 
derived from DNA of bacteria of the Arc/aaeadomain, 
preferably of the Sulfolobus genus, more preferably 

35 of the S. solfataricusspectes. 



V 
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In a preferred embodiment said polypeptide 
or fragments thereof have also a 3'-5' exonuclease 
activity. 

Preferably said nucleotide sequence codes 
5 the polypeptide having the aminoacid sequence of 
SEQ ID N2, or fragments thereof, alternatively 
deleted or substituted for one or more aminoacids, 
so that said DNA polymerase activity is maintained. 

Further object of the invention is a nucleic 

10 acid comprised in the sequence of SEQ ID Nl 
characterized in that from nucleotide 1 to 
nucleotide 197 is a non coding sequence, from 
nucleotide 198 to nucleotide 2843 coding a 
polypeptide with a thermostable and thermofilic DNA 

15 polymerase activity and from nucleotide 2844 to 
nucleotide 3112 is a non coding sequence. 
Alternatively said nucleotide sequence lacks or is 
substituted of one or more nucleotides so that said 
DNA polymerase activity is maintained. 

20 Another object of the invention are 

nucleotide sequences able to hybridize at medium 
stringency to nucleotide sequences of the 
invention, preferably said sequences are 
complementary to the sequences of the invention. 

25 It is another object of the invention a 

polypeptide with a thermostable and thermofilic DNA 
polymerase activity, preferably produced through 
recombinant DNA techniques by nucleotide sequences 
according to the invention, preferably by the 

30 nucleotide sequence comprised in SEQ ID Nl. 

According to the invention said polypeptide 
has a sequence comprised in SEQ ID N2 . 

It is a further object of the invention 
recombinant cloning or expression vectors, having a 

35 plasmid or viral derivation, comprising the 



p. 
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nucleotide sequences of the invention, preferably 
said vector is the plasmid pFCpolS (DSM N.7091) . 

Another object are cells transformed with 
said vectors. 

5 The invention will be described in the 

following examples, with reference to the 

following figures: 

figure 1 which represents a restriction map 
of the coding region of the DNA polymerase gene of 
10 S. solfataricus; 

figures 2a and 2b which represent a sequence 
analysis of DNA polymerase sequences from 

different organisms. 

Example 1 Partial aminoacid sequence of DNA 
15 polymerase from S. solfataricus 

30 jig of DNA polymerase purified from S. 
solfataricus, as described in Rossi M. et al. 1986, 
System. Appl. Microbiol. 7, 337-341, is loaded on 
a 10% polyacrylamrnide gel in denaturing conditions. 
20 The gel i6 then electro- transferred on a PVDF 
membrane (Problott, Applied Biosystems) , as 
described in Matsudaira, P. (1987) J. Biol. Chem. 
262, 10035-10038. The membrane is stained with 
Coomassie Brilliant Blue R-250. Three protein bands 
25 of 100, 50 e 40 kDa are cutted and loaded 
directly on a gas-phase aminoacid sequencer (M. 470 
A, Applied Biosystems) , with an analyzer PTH 120 A. 
N- terminal sequences of 50 e 40 kDa peptides are: 
50 kDa GYKGAWIDP 
30 40 kDa SAPVEEKKWR 

Example 2 I solat ion and sequence of the DNA 
polymerase gene of S. solfataricus 

By using aminoacid sequences the following 
degenerated oligonucleotides are sinthesized: 
35 29-mer SSDP50K corresponding to the N- 

terminal sequence of the 50 kDa fragment: 
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5' -GGATA(T/C) GG (T/A) GG(T/A) GC(T/A) 
GT (T/A) GT(T/A) AT (T/A) GAT CC-3' 

23-mer SSDP40K corresponding to the N- 
terminal sequence of the 40 kDa fragment: 
5 5«-GC(T/A) CC(T/A) GT (T/A) GA(A/G) AA(A/G) 

AA(A/G) GT-3«. 

Each oligonucleotide is labelled at its 5' 
end with xP 32 ATP by means of T4 polynucleotide 
kinase and used to screen a genomic library of S. 

10 solfataricus, strain MT4 (ATCC n. 49155) , in the X 
gtllvector, at . the EcoRI site, according to 
standard methods. Filter hybridization are made at 
45°C with the SSDP40K probe and at 50°C with the 
SSDP50K probe, in 6 x saline citrate buffer (SSC) 

15 as described in Maniatis, T. , Fritsch, E. F., and 
Sambrook, J. (1989) in Molecular Cloning. A 
Laboratory Manual. Cold Spring Harbor Laboratory, 
Cold Spring Harbor. Inserts of positive phages pAl, 
pC5 e pEl are subcloned into the EcoRI site of the 

20 pUC18 vector, and sequenced (Sequenase, USB) . The 
inserts have partial overlapping regions and an 
open reading frame, as shown in Fig. 1. 

Another genomic library obtained in the X 
EMBL3 vector with Mbol partially digested DHA of S. 

25 solfataricus, producing fragments of around 15 Kb, 
according to standard methods. The library is 
screened with the EcoRI insert of pC5 clone as 
probe. Hybridizations are performed on filters at 
65°C, 6 x SSC, according to Maniatis, T. , Fritsch, 

30 E. F., and Sambrook, J. (1989) in Molecular 
Cloning. A Laboratory Manual. Cold Spring Harbor 
Laboratory, Cold Spring Harbor. Two positive phages 
X 4B and X 2P are purified and digested with 
restriction enzymes (Fig. 1) . The EcoRI-PstI 

35 fragment, present in both phages, and able to 
hybridize with pEl, pAl and pC5 clones is inserted 




93/25691 PCI7IT93/00058 

6 

into the pEMBL8 vector, producing the plasmid named 
pFCpolS (DSM N. 7091) . The sequence is shown in 
SEQ ID Nl. The sequence shows a region of 882 
codons with an open reading frame, in agreement 
5 with the 100 kDa molecular weight of the protein. 
The 5 ? end non coding region does not comprise 
promoter sequences homologous to other 
ArcAaeabacterial promoters (Reiter, W.D., Palm, P. , 
and Zillig, W. (1988) Nucleic Acids Res. 16, 1-19; 

10 Reiter, W.D., Hudepohl, U. , and Zillig, W. (1990) 
Proc. Natl. Acad- Sci USA 87, 9509-9513. A 
pirimidine rich region comprising the TTTTTAT 
sequence is present at the 3 1 end of the termination 
codon, in analogy with other terminators from 

15 Archaea bacteria (Cubellis, M.V., Rozzo, C, Nitti, 
G., Arnone, M.I, Marino, G. , and Sannia, G. (1989) 
Eur. J. Biochem. 186/ 375-381; Cubellis, M.V. , 
Rozzo, C, Montecucchi, P., and Rossi, M. (1990) 
Gene 94, 89-94; Reiter, W.D., Palm, P., and Zullig, 

20 W. (1989) Nucleic Acid Res. 16, 2445-2459) . 

Example 3 Sequence homology with other DNA 
polymerases 

A sequence analysis shows homologies with 
class B DNA polymerases, as viral eucaryote 

25 replicases (Gibbs, J.S., Chiou, H.C., Hall, J.D., 
Mount, D.W., Retondo, M.J., Weller, S.K., and Coen, 

D. M. (1985) Proc. Natl. Acad. Sci. USA 82, 7969- 
7973; Kouzarides, T. , Bankier, A.T., Satchwell, 
S.C., Weston, K. , Tomlison, P., and Barrel, B.G. 

30 (1987) J. Virol. 61, 125-133; Earl, P.L., Jones, 

E. V., and Moss, B. (1986) Proc. Natl. Acad. Sci, 
USA 83, 3659-3663), human replicases (Wong, S.W. , 
Wahl, A.F., Yuan, P.M., Arai, N. , Pearson, B. E., 
Arai, K.-I., Korn, D., Hunkapiller, M.W. , and Wang, 

35 T. S.-F. (1988) EMBO J. 7, 37-47) and DNA 
polymerase a of i S. cerevisiae (Pizzagalli, A. , 
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Valsasnini, P., Plevani, P. , and Lucchini, G. 
(1988) Proc. Natl. Acad. Sci. USA 85, 3772-3776) . 
Pew homologies are evident with E. coli DNA 
polymerases (Joyce, CM. , Kelley, W.S., and 
5 Grindley, N.D. F. (1982) J. Biol. Chem. 257, 1958- 
1964) . 

Class B DNA polymerases show conserved 
motifs (Ito, J. , and Braithwaite, D.K. (1991) 
Nucleic Acids Res. 19, 4045-4057; Wong. S.W. , Zahl, 

10 A.F., Yuan, P.-M. , Arai, N. , Pearson, B. E. , Arai, 
K.-I., .Korn, D., Hunkapiller, M.W. , and Wang, T. 
S.-F. (1988) EMBO J. 7, 37-47; Iwasaki, H. , Ishino, 
Y., Toh, H. f Nakata, A., and Shinagawa, H. (1991) 
Mol. Gen. Genet. 226, 24-33; Larder, B.A. , Kemp, 

15 S.D., and Darby, G. (1987) EMBO J. 6, 169-175; 
Bernard, A., Zaballos, A., Salas, , M. , and Blanco, 
L. (1987) EMBO J. 6, 4219-4225; Blanco, L. , 
Bernard, A., Blasco, M.A. , and Salas, M. (1991) 
Gene 100, 27-38), which are found also in the 

20 sequence of the invention, as shown in Figs. 2a and 2b, 
regions 1-8. 

Regions 1, 2 e 3 correspond to EXO motifs 
found in DNA polymerases with 3' -5' exonuclease 
activity (Morrison, A., Bell, J.B., Kunkel, T.A. , 

25 and Sugino, A. (1991) Proc. Natl. Acad. Sci. USA 
88, 9473-9477), where three aspartic acid and one 
glutammic acid residues are maintained. 
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SEQUENCE LISTING . 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Consiglio Nazionale delle Ricerche 

(B) STREET: P.le Aldo Moro 5 

(C) CITY: Roma 

(D) STATE: Italy 

(E) COUNTRY: Italy 

(F) POSTAL CODE (ZIP) : 00185 

(ii) TITLE OF INVENTION: Nucleotide sequences coding for a DNA 
polymerase, DNA polymerase and uses thereof 

(iii) NUMBER OF SEQUENCES: 2 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC - DOS /MS - DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 (EPO) 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3112 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(iii) ANTI- SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Sulfolobus solfataricus 

( ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 198.. 2846 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

ATCTGGTGTT TTTCTTTCTC ATGCATATTA ATAATGTTTA CTAAGATTCA AGGCATATCT 60 

CTTAAGAAAT GGCTAGATGA ATGAGAGGAG CAGGAGTAGC TTAAGAATCT TAAAACTTAG 120 

GTTCTTCATA AATGTCTATT TTTTCTCCCG CATTAAAACT TATAGCGTAT TTCTCAGAAA 180 

ATAATATATG TTAGAAA ATG ACT AAG CAA CTT ACC TTA TTT GAT ATT CCT 230 

Met Thr Lys Gin Leu Thr Leu Phe Asp He Pro 
15 10 

TCA TCT AAA CCC GCT AAG AGT GAA CAA AAT ACT CAA CAA TCG CAA CAG 278 
Ser Ser Lys Pro Ala Lys Ser Glu Gin Asn Thr Gin Gin Ser Gin Gin 
15 20 25 

AGT GCT CCC GTT GAG GAA AAA AAG GTA GTT AGG AGG GAA TGG CTT GAA 326 
Ser Ala Pro Val Glu Glu Lys Lys val val Arg Arg Glu Trp Leu Glu 
30 35 40 

GAG GCT CAG GAA AAT AAG ATA TAC TTC CTA TTG CAA GTA GAT TAT GAT 374 
Glu Ala Gin Glu Asn Lys He Tyr Phe Leu Leu Gin Val Asp Tyr Asp 
45 50 55 

GGT AAG AAA GGT AAG GCT GTA TGT AAG CTA TTC GAT AAA GAA ACT CAA 422 
Gly Lys Lys Gly Lys Ala Val Cys Lys Leu Phe Asp Lys Glu Thr Gin 
60 65 70 75 
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AAG ATC TAT GCC CTA TAT GAT AAT ACT GGA CAT AAG CCC TAC TTT CTA 470 
Lys lie Tyr Ala Leu Tyr Asp Asn Thr Gly His Lys Pro Tyr Phe Leu 

80 85 90 

GTA GAT CTT GAA CCT GAT AAA GTA GGT AAA ATA CCT AAG ATT GTT AGA 518 
Val Asp Leu Glu Pro Asp Lys Val Gly Lys lie Pro Lys lie Val Arg 
95 100 105 

GAT CCA' TCT TTT GAT CAC ATA GAG ACT GTG AGT AAG ATA GAC CCG TAT 566 
Asp Pro Ser Phe Asp His lie Glu Thr Val Ser Lys lie Asp Pro Tyr 
110 115 120 

ACT TGG AAT AAA TTC AAA TTA ACT AAA ATC GTT GTT AGA GAT CCC CAT 614 
Thr Trp Asn Lys Phe Lys Leu Thr Lys lie Val Val Arg Asp Pro His 
125 130 135 

GCA GTG AGA AGA TTA AGG AAT GAT GTT CCA AAA GCG TAT GAG GCT CAC 662 
Ala Val Arg Arg Leu Arg Asn Asp Val Pro Lys Ala Tyr Glu Ala His 
140 145 150 155 

ATA AAA TAT TTT AAC AAC TAC ATG TAT GAC ATA GGT CTA ATC CCC GGT 710 
lie Lys Tyr Phe Asn Asn Tyr Met Tyr Asp lie Gly Leu lie Pro Gly 

160 165 170 

ATG CCT: TAT GTT GTT AAG AAT GGG AAG TTA GAA AGT GTC TAT TTG TCT 758 
Met Pro Tyr Val Val Lys Asn Gly Lys Leu Glu Ser Val Tyr Leu Ser 
175 180 185 

TTG GAC GAG AAA GAT GTT GAG GAG ATT AAG AAA GCC TTC GCT GAT TCA 806 
Leu Asp Glu Lys Asp Val Glu Glu lie Lys Lys Ala Phe Ala Asp Ser 
190 195 200 

GAT GAA ATG ACT AGA CAA ATG GCA GTC GAT TGG CTT CCC ATA TTT GAA 854 
Asp Glu Miet Thr Arg Gin Met Ala Val Asp Trp Leu Pro lie Phe Glu 
205 210 215 
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ACT GAA ATA CCT AAA ATA AAA AGG GTT GCG ATA GAT ATT GAG GTA TAT 902 
Thr Glu He Pro Lys He Lys Arg Val Ala He Asp He Glu Val Tyr 
220 225 230 235 

ACA CCA GTT AAG GGT AGA ATC CCA GAC TCT CAG AAG GCT GAG TTT CCA 950 
Thr Pro Val Lys Gly Arg He Pro Asp Ser Gin Lys Ala Glu Phe Pro 

240 245 250 

ATT ATA AGT ATA GCA TTA GCG GGG AGT GAT GGA TTA AAG AAG GTT CTT 998 
He He Ser He Ala Leu Ala Gly Ser Asp Gly Leu Lys Lys Val Leu 
255 260 265 

GTA TTA AAT AGG AAT GAT GTC AAT GAA GGG AGT GTA AAA CTT GAT GGA 1046 
Val Leu Asn Arg Asn Asp Val Asn Glu Gly Ser val Lys Leu Asp Giy 
270 275 280 

ATA TCG GTT GAG AGA TTT AAT ACA GAG TAC GAA CTG TTA GGG AGA TTT 1094 
He Ser Val Glu Arg Phe Asn Thr Glu Tyr Glu Leu Leu Gly Arg Phe 
285 290 295 

TTT GAT ATA CTG TTA GAA TAT CCG ATA GTT CTT ACA TTC AAT GGA GAC 1142 
Phe Asp He Leu Leu Glu Tyr Pro He Val Leu Thr Phe Asn Gly Asp 
300 305 310 315 

GAT TTT GAT TTA CCT TAC ATT TAC TTT AGG GCG TTA AAG TTA GGT TAT 1190 
Asp Phe Asp Leu Pro Tyr He Tyr Phe Arg Ala Leu Lys Leu Gly Tyr 

320 325 330 

TTT CCA GAG GAA ATT CCC ATA GAT GTA GCT GGT AAG GAT GAA GCC AAG 1238 
Phe Pro Glu Glu He Pro He Asp Val Ala Gly Lys Asp Glu Ala Lys 
335 340 345 

TAT CTA GCT GGT CTT CAT ATA GAC TTG TAC AAA TTC TTC TTT AAT AAG 1286 
Tyr Leu Ala Gly Leu His He Asp Leu Tyr Lys Phe Phe Phe Asn Lys 
350 355 360 
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GCA GTG AGG AAT TAT GCA TTT GAG GGA AAG TAT AAT GAA TAC AAT TTA 1334 
Ala Val Arg Asn Tyr Ala Phe Glu Gly Lys Tyr Asn Glu Tyr Asn Leu 
365 370 375 

GAT GCA GTT GCA AAG GCC TTA TTA GGG ACA TCA AAA GTT AAG GTA GAT 1382 
Asp Ala Val Ala Lys Ala Leu Leu Gly Thr Ser Lys Val Lys Val Asp 
380 385 390 395 

ACG CTA ATA TCT TTC TTA GAT GTA GAA AAA TTA ATA GAA TAT AAC TTT 1430 
Thr Leu lie Ser Phe Leu Asp Val Glu Lys Leu lie Glu Tyr Asn Phe 

400 405 410 

AGG GAT GCC GAA ATC ACA CTT CAG CTT ACT ACA TTT AAT AAC GAC CTA 1478 
Arg Asp Ala Glu lie Thr Leu Gin Leu Thr Thr Phe Asn Asn Asp Leu 
415 420 425 

ACT ATG AAG TTA ATT GTA TTG TTT TCT AGA ATT TCT AGA CTA GGA ATT 152 6 
Thr Met Lys Leu lie Val Leu Phe Ser Arg lie Ser Arg Leu Gly lie 
430 435 440 

GAG GAA TTA ACT CGG ACA GAA ATA TCT ACT TGG GTA AAG AAT TTA TAT 1574 
Glu Glu Leu Thr Arg Thr Glu lie Ser Thr Trp Val Lys Asn Leu Tyr 
445 450 455 

TAT TGG GAA CAT AGA AAA AGA AAT TGG TTA ATT CCT CTT AAG GAA GAA 1622 
Tyr Trp Glu His Arg Lys Arg Asn Trp Leu lie Pro Leu Lys Glu Glu 
460 465 470 475 

ATC TTA GCG AAA TCC TCT AAT ATA AGA ACT TCT GCT CTA ATA AAG GGA 1670 
lie Leu Ala Lys Ser Ser Asn lie Arg Thr Ser Ala Leu lie Lys Gly 

480 485 490 

AAA GGA TAT AAA GGC GCA GTA GTT ATA GAC CCA CCT GCT GGA ATA TTC 1718 
Lys Gly Tyr Lys Gly Ala Val Val lie Asp Pro Pro Ala Gly lie Phe 
495 500 505 
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TTT AAC ATA ACT GTT TTA GAT TTT GCA TCA CTA TAT CCT TCA ATA ATT 1766 

Phe Asn lie Thr Val Leu Asp Phe Ala Ser Leu Tyr Pro Ser lie lie 

510 515 520 

AGA ACG TGG AAT CTT AGT TAC GAG ACT GTA GAC ATT CAA CAA TGT AAG 1814 
Arg Thr Trp Asn Leu Ser Tyr Glu Thr Val Asp lie Gin Gin Cys Lys 
525 530 535 

AAG CCC TAT GAA GTA AAG GAT GAG ACA GGG GAG GTG CTA CAT ATA GTT 1862 
Lys Pro Tyr Glu Val Lys Asp Glu Thr Gly Glu Val Leu His lie val 
540 545 550 555 

TGC ATG GAT AGG CCA GGT ATA ACA GCA GTA ATA ACT GGG TTA CTA AGA 1910 
Cys Met Asp Arg Pro Gly lie Thr Ala Val lie Thr Gly Leu Leu Arg 

560 565 570 

GAC TTC AGA GTA AAG ATA TAC AAA AAG AAA GCG AAG AAC CCT AAT AAT 1958 
Asp Phe Arg Val Lys lie Tyr Lys Lys Lys Ala Lys Asn Pro Asn Asn 
575 580 585 

AGT GAG GAA CAA AAA CTA CTC TAT GAC GTA GTA CAG AGA GCA ATG AAA 2006 
Ser Glu Glu Gin Lys Leu Leu Tyr Asp val Val Gin Arg Ala Met Lys 
590 595 600 

GTA TTC ATA AAT GCT ACT TAC GGT GTA TTT GGA GCT GAA ACA TTT CCG 2054 
Val Phe lie Asn Ala Thr Tyr Gly Val Phe Gly Ala Glu Thr Phe Pro 
605 610 615 

TTA TAT GCG CCA CGT GTA GCG GAG AGT GTT ACT GCA CTG GGG AGA TAC 2102 
Leu Tyr Ala Pro Arg Val Ala Glu Ser Val Thr Ala Leu Gly Arg Tyr 
620 625 630 635 

GTT ATT ACC AGT ACC GTA AAG AAA GCT AGG GAA GAA GGT TTA ACT GTA 2150 
val lie Thr Ser Thr Val Lys Lys Ala Arg Glu Glu Gly Leu Thr Val 

640 645 650 
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TTA TAC GGT GAT ACT GAT TCT TTA TTC CTC CTT AAT CCT CCC AAG AAT 2198 
Leu Tyr Gly Asp Thr Asp Ser Leu Phe Leu Leu Asn Pro Pro Lys Asn 
655 660 665 

AGT TTA GAA AAT ATT ATA AAA TGG GTT AAA ACT ACT TTC AAT TTA GAT 2246 
Ser Leu Glu Asn lie lie Lys Trp Val Lys Thr Thr Phe Asn Leu Asp 
670 675 680 

TTG GAA GTT GAT AAA ACC TAC AAG TTT GTG GOT TTT TCT GGA TTG AAG 2294 
Leu Glu Val Asp Lys Thr Tyr Lys Phe Val Ala Phe Ser Gly Leu Lys 
685 690 695 

AAG AAT TAC TTT GGA GTA TAC CAA GAC GGG AAG GTT GAT ATA AAG GGG 2342 
Lys Asn Tyr Phe Gly Val Tyr Gin Asp Gly Lys Val Asp lie Lys Gly 
700 705 710 715 

ATG TTA GTG AAG AAG AGA AAC ACG CCG GAA TTT GTA AAG AAG. GTA TTT 2390 
Met Leu Val Lys Lys Arg Asn Thr Pro Glu Phe Val Lys Lys Val Phe 

720 725 730 

AAC GAG GTA AAG GAG CTA ATG ATC TCC ATA AAC TCG CCA AAC GAT GTG 2438 
Asn Glu Val Lys Glu Leu Met lie Ser lie Asn Ser Pro Asn Asp val 
735 740 745 

AAG GAG ATT AAA AGA AAA ATT GTA GAC GTA GTT AAA GGA TCA TAT GAA 2486 
Lys Glu lie Lys Arg Lys lie Val Asp Val Val Lys Gly Ser Tyr Glu 
750 755 760 

AAA CTA AAA AAC AAA GGA TAC AAT CTG GAC GAA TTA GCG TTT AAA GTA 2534 
Lys Leu Lys Asn Lys Gly Tyr Asn Leu Asp Glu Leu Ala Phe Lys Val 
765 770 775 

ATG CTA TCG AAG CCT TTA GAT GCG TAC AAA AAG AAC ACT CCC CAA CAC 2582 
Met Leu Ser Lys Pro Leu Asp Ala Tyr Lys Lys Asn Thr Pro Gin His 
780 785 790 795 



\ 
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GTA AAG GCA GCT CTA CAA CTT AGA CCA TTT GGA GTT AAC GTA TTA CCA 2630 
Val Lys Ala Ala Leu Gin Leu Arg Pro Phe Gly Val Asn Val Leu Pro 

800 805 810 

CGA GAT ATA ATA TAC TAT GTT AAG GTT AGA TCT AAA GAT GGA GTG AAA 2678 
Arg Asp lie lie Tyr Tyr Val Lys Val Arg Ser Lys Asp Gly Val Lys 
815 820 825 

CCA GTA CAA CTA GCT AAA GTT ACT GAA ATA GAC GCA GAG AAA TAT TTA 2726 
Pro Val Gin Leu Ala Lys Val Thr Glu lie Asp Ala Glu Lys Tyr Leu 
830 835 840 

GAA GCG TTA AGA AGT ACG TTT GAG CAA ATC TTA AGG GCA TTC GGA GTC 2774 
Glu Ala Leu Arg Ser Thr Phe Glu Gin lie Leu Arg Ala Phe Gly Val 
845 850 855 

TCT TGG GAT GAG ATA GCA GCC AGA *LTG TCG ATA GAT TCG TTC TTT TCA 2822 
Ser Trp Asp Glu lie Ala Ala Thr Met Ser lie Asp Ser Phe Phe Ser 
860 865 870 875 

TAC CCA AGT AAA GGA AAT AGT TAATTAAGAA AGATAGCAAT TCTTCATAAT 2873 
Tyr Pro Ser Lys Gly Asn Ser 

880 

AAATTTTTAG AAGCAATTTT TACCCACATA AGTTATAAAG ATTTTTAGAA AATTTAAATC 2933 
GTATATTTTT ATTCTTCCTC CTCTTCCTCT AATTCTTCCT TTAATTCTTC TTGTTTCTGC 2993 
ATACCCAAGT AAAGGAAATA GTTAATTAAG AAAGATAGCA ATTCTTCATA ATAAATTTTT 3053 



AGAAGCAATT TTTACCCACA TAAGTTATAA AGATTTTTAG AAAATTTAAA TCGTATATT 3112 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 882 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Thr Lys Gin Leu Thr Leu Phe Asp lie Pro Ser Ser Lys Pro Ala 
1 5 10 15 

Lys Ser Glu Gin Asn Thr Gin Gin Ser Gin Gin Ser Ala Pro Val Glu 
20 25 30 

Glu Lys Lys Val Val Arg Arg Glu Trp Leu Glu Glu Ala Gin Glu Asn 
35 40 45 

Lys lie Tyr Phe Leu Leu Gin Val Asp Tyr Asp Gly Lys Lys Gly Lys 
50 55 60 

Ala Val Cys Lys Leu Phe Asp Lys Glu Thr Gin Lys lie Tyr Ala Leu 
65 70 75 80 

Tyr Asp Asn Thr Gly His Lys Pro Tyr Phe Leu Val Asp Leu Glu Pro 

85 90 95 

Asp Lys Val Gly Lys lie Pro Lys lie Val Arg Asp Pro Ser Phe Asp 
100 105 110 



His lie Glu 
115 



Thr Val Ser Lys lie Asp Pro Tyr Thr Trp Asn 

120 125 



Lys Phe 
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Lys Leu Thr Lys lie Val Val Arg Asp Pro His Ala Val Arg Arg Leu 
130 135 140 

Arg Asn Asp Val Pro Lys Ala Tyr Glu Ala His He Lys Tyr Phe Asn 
145 150 155 160 

Asn Tyr Met Tyr Asp He Gly Leu He Pro Gly Met Pro Tyr Val Val 

165 170 175 

Lys Asn Gly Lys Leu Glu Ser Val Tyr Leu Ser Leu Asp Glu Lys Asp 
180 185 190 

val Glu Glu He Lys Lys Ala Phe Ala Asp Ser Asp Glu Met Thr Arg 
195 200 205 

Gin Met Ala Val Asp Trp Leu Pro He Phe Glu Thr Glu He Pro Lys 
210 215 220 

He Lys Arg Val Ala He Asp He Glu Val Tyr Thr Pro Val Lys Gly 
225 230 235 240 

Arg He Pro Asp Ser Gin Lys Ala Glu Phe Pro He He Ser He Ala 

245 250 255 



Leu Ala Gly Ser 
260 

Asp Val Asn Glu 
275 

Phe Asn Thr Glu 
290 

Glu Tyr Pro He 
305 



Asp Gly Leu Lys 



Gly Ser val Lys 
280 

Tyr Glu Leu Leu 
295 

Val Leu Thr Phe 
310 



Lys Val Leu Val 
265 

Leu Asp Gly He 



Gly Arg Phe Phe 
300 

Asn Gly Asp Asp 
315 



Leu Asn Arg Asn 
270 

Ser Val Glu Arg 
285 

Asp He Leu Leu 



Phe Asp Leu Pro 

320 
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Tyr lie Tyr Phe Arg Ala Leu Lys Leu Gly Tyr Phe Pro Glu Glu lie 

325 330 335 

Pro lie Asp Val Ala Gly Lys Asp Glu Ala Lys Tyr Leu Ala Gly Leu 
340 345 350 

His lie Asp Leu Tyr Lys Phe Phe Phe Asn Lys Ala Val Arg Asn Tyr 
355 360 365 

Ala Phe Glu Gly Lys Tyr Asn Glu Tyr Asn Leu Asp Ala Val Ala Lys 
370 375 380 

Ala Leu Leu Gly Thr Ser Lys Val Lys Val Asp Thr Leu lie Ser Phe 
385 390 395 400 

Leu Asp Val Glu Lys Leu lie Glu Tyr Asn Phe Arg Asp Ala Glu lie 

405 410 415 

Thr Leu Gin Leu Thr Thr Phe Asn Asn Asp Leu Thr Met Lys Leu lie 
420 425 430 

Val Leu Phe Ser Arg lie Ser Arg Leu Gly lie Glu Glu Leu Thr Arg 
435 440 445 

Thr Glu lie Ser Thr Trp Val Lys Asn Leu Tyr Tyr Trp Glu His Arg 
450 455 460 

Lys Arg Asn Trp Leu lie Pro Leu Lys Glu Glu lie Leu Ala Lys Ser 
465 470 475 480 

Ser Asn lie Arg Thr Ser Ala Leu lie Lys Gly Lys Gly Tyr Lys Gly 

485 490 495 



Ala Val Val lie Asp Pro Pro Ala Gly lie Phe Phe Asn lie Thr Val 
500 505 510 
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Leu Asp Phe Ala Ser Leu Tyr Pro Ser lie lie Arg Thr Trp Asn Leu 
515 520 525 

Ser Tyr Glu Thr Val Asp lie Gin Gin Cys Lys Lys Pro Tyr Glu Val 
530 535 540 

Lys Asp Glu Thr Gly Glu Val Leu His lie Val Cys Met Asp Arg Pro 
545 550 555 560 

Gly lie Thr Ala Val lie Thr Gly Leu Leu Arg Asp Phe Arg Val Lys 

565 570 . 575 

lie Tyr Lys Lys Lys Ala Lys Asn Pro Asn Asn Ser Glu Glu Gin Lys 
580 585 590 

Leu Leu Tyr Asp Val Val Gin Arg Ala Met Lys Val Phe lie Asn Ala 
595 600 605 

Thr Tyr Gly Val Phe Gly Ala Glu Thr Phe Pro Leu Tyr Ala Pro Arg 
610 615 620 

Val Ala Glu Ser Val Thr Ala Leu Gly Arg Tyr Val lie Thr Ser Thr 
625 630 635 640 

Val Lys Lys Ala Arg Glu Glu Gly Leu Thr Val Leu Tyr Gly Asp Thr 

645 650 655 

Asp Ser Leu Phe Leu Leu Asn Pro Pro Lys Asn Ser Leu Glu Asn lie 
660 665 670 

He Lys Trp Val Lys Thr Thr Phe Asn Leu Asp Leu Glu Val Asp Lys 
675 680 685 



Thr Tyr Lys Phe Val Ala Phe Ser Gly Leu Lys Lys Asn Tyr Phe Gly 
690 695 700 
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Val Tyr Gin Asp 
705 

Arg Asn Thr Pro 



Leu Met lie Ser 
740 

Lys lie Val Asp 
755 

Gly Tyr Asn Leu 
770 

Leu Asp Ala Tyr 
785 

Gin Leu Arg Pro 



Tyr Val Lys Val 
820 

Lys Val Thr Glu 
835 



Gly Lys Val Asp 
710 

Glu Phe Val Lys 
725 

lie Asn Ser Pro 



Val Val Lys Gly 
760 

Asp Glu Leu Ala 
775 

Lys Lys Asn Thr 
790 

Phe Gly Val Asn 
805 

Arg Ser Lys Asp 



lie Asp Ala Glu 
840 



lie Lys Gly Met 
715 

Lys Val Phe Asn 
730 

Asn Asp Val Lys 
745 

Ser Tyr Glu Lys 



Phe Lys Val Met 
780 

Pro Gin His Val 
795 

Val Leu Pro Arg 
810 

Gly Val Lys Pro 
825 

Lys Tyr Leu Glu 



Leu Val Lys Lys 
720 

Glu Val Lys Glu 



Glu lie Lys Arg 
750 

Leu Lys Asn Lys 
765 

Leu Ser Lys Pro 



Lys Ala Ala Leu 
800 

Asp lie lie Tyr 
815 

val Gin Leu Ala 
830 

Ala Leu Arg Ser 
845 



Thr Phe Glu Gin lie Leu Arg Ala Phe Gly Val Ser Trp Asp Glu lie 

850 855 860 

Ala Ala Thr Met Ser lie Asp Ser Phe Phe Ser Tyr Pro Ser Lys Gly 

865 870 875 880 



Asn Ser 
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Claims 

1. Nucleic acid of natural, recombinant or 
synthetic origin, comprising a nucleotide sequence 
coding a polypeptide or fragments thereof having a 
5 thermostable and thermof ilic DNA polymerase 
activity. 

2 • Nucleic acid according to Claim 1 wherein 
said nucleotide sequence is derived from bacteria 
of the Archaeadomain. 
10 3. Nucleic acid according to Claim 2 wherein 

said nucleotide sequence is derived from bacteria 
of the Sulfolobus genus. 

4 . Nucleic acid according to Claim 2 wherein 
said nucleotide sequence is derived from bacteria 

15 of the S. sol f atari cusspecies . 

5. Nucleic acid according to any of previous 
Claims wherein said polypeptide or fragments 
thereof have also a 3 1 -5 1 exonuclease activity. 

6. Nucleic acid according to Claim 5 wherein 
20 said nucleotide sequence codes the polypeptide 

having the aminoacid sequence of SEQ ID N2 or 
fragments thereof* 

7. Nucleic acid according to Claim 6 wherein 
said nucleotide sequence codes the polypeptide 

25 having the aminoacid sequence of SEQ ID N2 or 
fragments thereof, deleted or substituted for one 
or more aminoacids, so that said DNA polymerase 
activity is maintained. 

8. Nucleic acid comprised in the sequence of 
30 SEQ ID Nl characterized in that from nucleotide 1 

to nucleotide 19 7 is a non coding sequence, from 
nucleotide 198 to nucleotide 2843 coding a 
polypeptide with a thermostable and thermofilic DNA 
polymerase activity and from nucleotide 2844 to 
35 nucleotide 3112 is a non coding sequence. 
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9 . Nucleic acid according to Claim 8 wherein 
said coding sequence lacks or is substituted of one 
or more nucleotides so that said DNA polymerase 
activity is maintained. 
5 10. Nucleic acid able to hybridize at least 

at medium stringency to a nucleic acid according to 
any of previous Claims. 

11. Nucleic acid according to Claim 10 
complementary to nucleotide sequences from Claim l 

10 to 9. 

12. Polypeptide with a thermostable and 
thermofilic DNA polymerase activity. 

13. Polypeptide according to Claim 12 
produced through recombinant DNA techniques by 

15 nucleic acids according to any of previous Claims 
from 1 to 11. 

14. Polypeptide according to Claim 13 
produced by the nucleotide sequence comprised in 
SEQ ID m. 

20 15. Polypeptide according to Claim 14 having 

a sequence comprised in SEQ ID N2. 

16. Recombinant cloning or expression 
vectors, having a plasmid or viral derivation, 
comprising nucleotide sequences accoprding to any 

25 of previous Claims from 1 to 11. 

17. Recombinant vector according to Claim 16 
being the plasmid pFCpolS (DSM N.7091) . 

18. Cells transformed with vectors 
according to Claims 16 or 17. 
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